Best Arm Identification in Restless Markov Multi-Armed Bandits

نویسندگان

چکیده

We study the problem of identifying best arm in a multi-armed bandit environment when each is time-homogeneous and ergodic discrete-time Markov process on common, finite state space. The evolution governed by arm’s transition probability matrix (TPM). A decision entity that knows set TPMs but not exact mapping to arms, wishes find index as quickly possible, subject an upper bound error probability. selects one at time sequentially, all unselected arms continue undergo (restless arms). For this problem, we derive first-known instance-dependent asymptotic lower growth rate expected required arm, where asymptotics vanishes. Further, propose sequential policy that, for input parameter $R$ , forcibly has been selected consecutive instants. show achieves depends monotonically non-increasing notation="LaTeX">$R\to \infty $ . question whether, general, limiting value matches with bound, remains open. identify special case which bounds match. Prior works identification have dealt (a) independent identically distributed observations from (b) rested whereas our work deals more difficult setting restless arms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Best Arm Identification in Multi-Armed Bandits

We consider the problem of finding the best arm in a stochastic multi-armed bandit game. The regret of a forecaster is here defined by the gap between the mean reward of the optimal arm and the mean reward of the ultimately chosen arm. We propose a highly exploring UCB policy and a new algorithm based on successive rejects. We show that these algorithms are essentially optimal since their regre...

متن کامل

Best arm identification in multi-armed bandits with delayed feedback

We propose a generalization of the best arm identification problem in stochastic multiarmed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework ...

متن کامل

Practical Algorithms for Best-K Identification in Multi-Armed Bandits

In the Best-K identification problem (Best-K-Arm), we are given N stochastic bandit arms with unknown reward distributions. Our goal is to identify the K arms with the largest means with high confidence, by drawing samples from the arms adaptively. This problem is motivated by various practical applications and has attracted considerable attention in the past decade. In this paper, we propose n...

متن کامل

Best-Arm Identification in Linear Bandits

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter θ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In parti...

متن کامل

Multi - armed restless bandits , index policies , and dynamic priority allocation

This paper presents a brief introduction to the emerging research field of multi-armed restless bandits (MARBs), which substantially extend the modeling power of classic multi-armed bandits. MARBs are Markov decision process models for optimal dynamic priority allocation to a collection of stochastic binary-action (active/passive) projects evolving over time. Interest in MARBs has grown steadil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Information Theory

سال: 2023

ISSN: ['0018-9448', '1557-9654']

DOI: https://doi.org/10.1109/tit.2022.3230939